Overview

Dataset statistics

Number of variables9
Number of observations768
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory54.1 KiB
Average record size in memory72.2 B

Variable types

NUM8
BOOL1

Reproduction

Analysis started2020-10-11 22:18:27.106794
Analysis finished2020-10-11 22:18:39.695974
Duration12.59 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Pregnancies has 111 (14.5%) zeros Zeros
BloodPressure has 35 (4.6%) zeros Zeros
SkinThickness has 227 (29.6%) zeros Zeros
Insulin has 374 (48.7%) zeros Zeros
BMI has 11 (1.4%) zeros Zeros

Variables

Pregnancies
Real number (ℝ≥0)

ZEROS

Distinct count17
Unique (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.8450520833333335
Minimum0
Maximum17
Zeros111
Zeros (%)14.5%
Memory size6.0 KiB
2020-10-11T15:18:39.795551image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q36
95-th percentile10
Maximum17
Range17
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.369578063
Coefficient of variation (CV)0.8763413316
Kurtosis0.1592197775
Mean3.845052083
Median Absolute Deviation (MAD)2
Skewness0.9016739792
Sum2953
Variance11.35405632
2020-10-11T15:18:39.910283image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
113517.6%
 
011114.5%
 
210313.4%
 
3759.8%
 
4688.9%
 
5577.4%
 
6506.5%
 
7455.9%
 
8384.9%
 
9283.6%
 
Other values (7)587.6%
 
ValueCountFrequency (%) 
011114.5%
 
113517.6%
 
210313.4%
 
3759.8%
 
4688.9%
 
ValueCountFrequency (%) 
1710.1%
 
1510.1%
 
1420.3%
 
13101.3%
 
1291.2%
 

Glucose
Real number (ℝ≥0)

Distinct count136
Unique (%)17.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean120.89453125
Minimum0
Maximum199
Zeros5
Zeros (%)0.7%
Memory size6.0 KiB
2020-10-11T15:18:40.019817image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile79
Q199
median117
Q3140.25
95-th percentile181
Maximum199
Range199
Interquartile range (IQR)41.25

Descriptive statistics

Standard deviation31.9726182
Coefficient of variation (CV)0.2644670347
Kurtosis0.6407798204
Mean120.8945312
Median Absolute Deviation (MAD)20
Skewness0.1737535018
Sum92847
Variance1022.248314
2020-10-11T15:18:40.112307image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
100172.2%
 
99172.2%
 
129141.8%
 
125141.8%
 
111141.8%
 
106141.8%
 
95131.7%
 
108131.7%
 
105131.7%
 
102131.7%
 
Other values (126)62681.5%
 
ValueCountFrequency (%) 
050.7%
 
4410.1%
 
5610.1%
 
5720.3%
 
6110.1%
 
ValueCountFrequency (%) 
19910.1%
 
19810.1%
 
19740.5%
 
19630.4%
 
19520.3%
 

BloodPressure
Real number (ℝ≥0)

ZEROS

Distinct count47
Unique (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean69.10546875
Minimum0
Maximum122
Zeros35
Zeros (%)4.6%
Memory size6.0 KiB
2020-10-11T15:18:40.207641image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile38.7
Q162
median72
Q380
95-th percentile90
Maximum122
Range122
Interquartile range (IQR)18

Descriptive statistics

Standard deviation19.35580717
Coefficient of variation (CV)0.2800908166
Kurtosis5.18015656
Mean69.10546875
Median Absolute Deviation (MAD)8
Skewness-1.843607983
Sum53073
Variance374.6472712
2020-10-11T15:18:40.292100image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
70577.4%
 
74526.8%
 
68455.9%
 
78455.9%
 
72445.7%
 
64435.6%
 
80405.2%
 
76395.1%
 
60374.8%
 
0354.6%
 
Other values (37)33143.1%
 
ValueCountFrequency (%) 
0354.6%
 
2410.1%
 
3020.3%
 
3810.1%
 
4010.1%
 
ValueCountFrequency (%) 
12210.1%
 
11410.1%
 
11030.4%
 
10820.3%
 
10630.4%
 

SkinThickness
Real number (ℝ≥0)

ZEROS

Distinct count51
Unique (%)6.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20.536458333333332
Minimum0
Maximum99
Zeros227
Zeros (%)29.6%
Memory size6.0 KiB
2020-10-11T15:18:40.381003image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median23
Q332
95-th percentile44
Maximum99
Range99
Interquartile range (IQR)32

Descriptive statistics

Standard deviation15.95221757
Coefficient of variation (CV)0.776775494
Kurtosis-0.5200718662
Mean20.53645833
Median Absolute Deviation (MAD)12
Skewness0.1093724965
Sum15772
Variance254.4732453
2020-10-11T15:18:40.461443image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
022729.6%
 
32314.0%
 
30273.5%
 
27233.0%
 
23222.9%
 
33202.6%
 
18202.6%
 
28202.6%
 
31192.5%
 
39182.3%
 
Other values (41)34144.4%
 
ValueCountFrequency (%) 
022729.6%
 
720.3%
 
820.3%
 
1050.7%
 
1160.8%
 
ValueCountFrequency (%) 
9910.1%
 
6310.1%
 
6010.1%
 
5610.1%
 
5420.3%
 

Insulin
Real number (ℝ≥0)

ZEROS

Distinct count186
Unique (%)24.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean79.79947916666667
Minimum0
Maximum846
Zeros374
Zeros (%)48.7%
Memory size6.0 KiB
2020-10-11T15:18:40.543609image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median30.5
Q3127.25
95-th percentile293
Maximum846
Range846
Interquartile range (IQR)127.25

Descriptive statistics

Standard deviation115.2440024
Coefficient of variation (CV)1.444169856
Kurtosis7.214259554
Mean79.79947917
Median Absolute Deviation (MAD)30.5
Skewness2.272250858
Sum61286
Variance13281.18008
2020-10-11T15:18:40.626282image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
037448.7%
 
105111.4%
 
14091.2%
 
13091.2%
 
12081.0%
 
10070.9%
 
9470.9%
 
18070.9%
 
11060.8%
 
11560.8%
 
Other values (176)32442.2%
 
ValueCountFrequency (%) 
037448.7%
 
1410.1%
 
1510.1%
 
1610.1%
 
1820.3%
 
ValueCountFrequency (%) 
84610.1%
 
74410.1%
 
68010.1%
 
60010.1%
 
57910.1%
 

BMI
Real number (ℝ≥0)

ZEROS

Distinct count248
Unique (%)32.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean31.992578124999998
Minimum0.0
Maximum67.1
Zeros11
Zeros (%)1.4%
Memory size6.0 KiB
2020-10-11T15:18:40.715867image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile21.8
Q127.3
median32
Q336.6
95-th percentile44.395
Maximum67.1
Range67.1
Interquartile range (IQR)9.3

Descriptive statistics

Standard deviation7.88416032
Coefficient of variation (CV)0.2464371671
Kurtosis3.290442901
Mean31.99257812
Median Absolute Deviation (MAD)4.6
Skewness-0.4289815885
Sum24570.3
Variance62.15998396
2020-10-11T15:18:40.808849image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
32131.7%
 
31.6121.6%
 
31.2121.6%
 
0111.4%
 
33.3101.3%
 
32.4101.3%
 
32.891.2%
 
30.891.2%
 
32.991.2%
 
30.191.2%
 
Other values (238)66486.5%
 
ValueCountFrequency (%) 
0111.4%
 
18.230.4%
 
18.410.1%
 
19.110.1%
 
19.310.1%
 
ValueCountFrequency (%) 
67.110.1%
 
59.410.1%
 
57.310.1%
 
5510.1%
 
53.210.1%
 

DPF
Real number (ℝ≥0)

Distinct count517
Unique (%)67.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.47187630208333325
Minimum0.078
Maximum2.42
Zeros0
Zeros (%)0.0%
Memory size6.0 KiB
2020-10-11T15:18:40.903086image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0.078
5-th percentile0.14035
Q10.24375
median0.3725
Q30.62625
95-th percentile1.13285
Maximum2.42
Range2.342
Interquartile range (IQR)0.3825

Descriptive statistics

Standard deviation0.331328595
Coefficient of variation (CV)0.7021513764
Kurtosis5.594953528
Mean0.4718763021
Median Absolute Deviation (MAD)0.1675
Skewness1.919911066
Sum362.401
Variance0.1097786379
2020-10-11T15:18:40.989827image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.25460.8%
 
0.25860.8%
 
0.25950.7%
 
0.23850.7%
 
0.20750.7%
 
0.26850.7%
 
0.26150.7%
 
0.16740.5%
 
0.1940.5%
 
0.2740.5%
 
Other values (507)71993.6%
 
ValueCountFrequency (%) 
0.07810.1%
 
0.08410.1%
 
0.08520.3%
 
0.08820.3%
 
0.08910.1%
 
ValueCountFrequency (%) 
2.4210.1%
 
2.32910.1%
 
2.28810.1%
 
2.13710.1%
 
1.89310.1%
 

Age
Real number (ℝ≥0)

Distinct count52
Unique (%)6.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.240885416666664
Minimum21
Maximum81
Zeros0
Zeros (%)0.0%
Memory size6.0 KiB
2020-10-11T15:18:41.079121image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile21
Q124
median29
Q341
95-th percentile58
Maximum81
Range60
Interquartile range (IQR)17

Descriptive statistics

Standard deviation11.76023154
Coefficient of variation (CV)0.3537881556
Kurtosis0.6431588885
Mean33.24088542
Median Absolute Deviation (MAD)7
Skewness1.129596701
Sum25529
Variance138.3030459
2020-10-11T15:18:41.174777image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
22729.4%
 
21638.2%
 
25486.2%
 
24466.0%
 
23384.9%
 
28354.6%
 
26334.3%
 
27324.2%
 
29293.8%
 
31243.1%
 
Other values (42)34845.3%
 
ValueCountFrequency (%) 
21638.2%
 
22729.4%
 
23384.9%
 
24466.0%
 
25486.2%
 
ValueCountFrequency (%) 
8110.1%
 
7210.1%
 
7010.1%
 
6920.3%
 
6810.1%
 

Outcome
Boolean

Distinct count2
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.0 KiB
0
500
1
268
ValueCountFrequency (%) 
050065.1%
 
126834.9%
 

Interactions

2020-10-11T15:18:30.928639image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:31.096868image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:31.341273image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:31.502416image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:31.648113image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:31.773527image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:31.906844image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:32.033936image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:32.157674image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:32.290400image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:32.428472image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:32.680570image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:32.807018image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:32.928240image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:33.061587image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:33.183235image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:33.306065image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:33.434509image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:33.558109image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:33.691465image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:33.805721image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:33.920097image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:34.042117image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:34.153514image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:34.270277image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:34.395377image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:34.638791image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:34.762149image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:34.874913image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:34.985125image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:35.103239image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:35.215630image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:35.329083image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:35.442355image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:35.555187image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:35.666564image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:35.790427image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:35.897423image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:36.012359image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:36.117903image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:36.229133image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:36.352511image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:36.478676image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:36.597703image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:36.713441image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:36.832132image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:36.960266image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:37.078463image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:37.212877image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:37.360855image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:37.485133image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:37.604548image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:37.717347image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:37.846169image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:37.969766image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:38.079453image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:38.194251image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:38.313957image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:38.435791image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:38.553611image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:38.666893image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:38.908742image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:39.035850image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:39.152556image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-10-11T15:18:41.287579image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-10-11T15:18:41.451208image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-10-11T15:18:41.599358image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-10-11T15:18:41.747751image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-10-11T15:18:39.415162image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-10-11T15:18:39.598223image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDPFAgeOutcome
061487235033.60.627501
11856629026.60.351310
28183640023.30.672321
318966239428.10.167210
40137403516843.12.288331
55116740025.60.201300
637850328831.00.248261
71011500035.30.134290
82197704554330.50.158531
9812596000.00.232541

Last rows

PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDPFAgeOutcome
7581106760037.50.197260
7596190920035.50.278661
76028858261628.40.766220
76191707431044.00.403431
762989620022.50.142330
76310101764818032.90.171630
76421227027036.80.340270
7655121722311226.20.245300
7661126600030.10.349471
7671937031030.40.315230